51 research outputs found

    Semantically Aware Text Categorisation for Metadata Annotation

    Get PDF
    In this paper we illustrate a system aimed at solving a longstanding and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work.</p

    A Corpus of Potentially Contradictory Research Claims from Cardiovascular Research Abstracts

    Get PDF
    Background: Research literature in biomedicine and related fields contains a huge number of claims, such as the effectiveness of treatments. These claims are not always consistent and may even contradict each other. Being able to identify contradictory claims is important for those who rely on the biomedical literature. Automated methods to identify and resolve them are required to cope with the amount of information available. However, research in this area has been hampered by a lack of suitable resources. We describe a methodology to develop a corpus which addresses this gap by providing examples of potentially contradictory claims and demonstrate how it can be applied to identify these claims from Medline abstracts related to the topic of cardiovascular disease. Methods A set of systematic reviews concerned with four topics in cardiovascular disease were identified from Medline and analysed to determine whether the abstracts they reviewed contained contradictory research claims. For each review, annotators were asked to analyse these abstracts to identify claims within them that answered the question addressed in the review. The annotators were also asked to indicate how the claim related to that question and the type of the claim. Results: A total of 259 abstracts associated with 24 systematic reviews were used to form the corpus. Agreement between the annotators was high, suggesting that the information they provided is reliable. Conclusions: The paper describes a methodology for constructing a corpus containing contradictory research claims from the biomedical literature. The corpus is made available to enable further research into this area and support the development of automated approaches to contradiction identification

    Negated bio-events: Analysis and identification

    Get PDF
    Background: Negation occurs frequently in scientific literature, especially in biomedical literature. It has previously been reported that around 13% of sentences found in biomedical research articles contain negation. Historically, the main motivation for identifying negated events has been to ensure their exclusion from lists of extracted interactions. However, recently, there has been a growing interest in negative results, which has resulted in negation detection being identified as a key challenge in biomedical relation extraction. In this article, we focus on the problem of identifying negated bio-events, given gold standard event annotations.Results: We have conducted a detailed analysis of three open access bio-event corpora containing negation information (i.e., GENIA Event, BioInfer and BioNLP'09 ST), and have identified the main types of negated bio-events. We have analysed the key aspects of a machine learning solution to the problem of detecting negated events, including selection of negation cues, feature engineering and the choice of learning algorithm. Combining the best solutions for each aspect of the problem, we propose a novel framework for the identification of negated bio-events. We have evaluated our system on each of the three open access corpora mentioned above. The performance of the system significantly surpasses the best results previously reported on the BioNLP'09 ST corpus, and achieves even better results on the GENIA Event and BioInfer corpora, both of which contain more varied and complex events.Conclusions: Recently, in the field of biomedical text mining, the development and enhancement of event-based systems has received significant interest. The ability to identify negated events is a key performance element for these systems. We have conducted the first detailed study on the analysis and identification of negated bio-events. Our proposed framework can be integrated with state-of-the-art event extraction systems. The resulting systems will be able to extract bio-events with attached polarities from textual documents, which can serve as the foundation for more elaborate systems that are able to detect mutually contradicting bio-events. © 2013 Nawaz et al.; licensee BioMed Central Ltd
    corecore